AITopics | visual interpretation

Collaborating Authors

visual interpretation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Seeing the Big Picture: Evaluating Multimodal LLMs' Ability to Interpret and Grade Handwritten Student Work

Henkel, Owen, Roberts, Bill, Jaffe, Doug, Holt, Laurence

arXiv.org Artificial IntelligenceOct-8-2025

Recent advances in multimodal large language models (MLLMs) raise the question of their potential for grading, analyzing, and offering feedback on handwritten student classwork. This capability would be particularly beneficial in elementary and middle-school mathematics education, where most work remains handwritten, because seeing students' full working of a problem provides valuable insights into their learning processes, but is extremely time-consuming to grade. We present two experiments investigating MLLM performance on handwritten student mathematics classwork. Experiment A examines 288 handwritten responses from Ghanaian middle school students solving arithmetic problems with objective answers. In this context, models achieved near-human accuracy (95%, k = 0.90) but exhibited occasional errors that human educators would be unlikely to make. Experiment B evaluates 150 mathematical illustrations from American elementary students, where the drawings are the answer to the question. These tasks lack single objective answers and require sophisticated visual interpretation as well as pedagogical judgment in order to analyze and evaluate them. We attempted to separate MLLMs' visual capabilities from their pedagogical abilities by first asking them to grade the student illustrations directly, and then by augmenting the image with a detailed human description of the illustration. We found that when the models had to analyze the student illustrations directly, they struggled, achieving only k = 0.20 with ground truth scores, but when given human descriptions, their agreement levels improved dramatically to k = 0.47, which was in line with human-to-human agreement levels. This gap suggests MLLMs can "see" and interpret arithmetic work relatively well, but still struggle to "see" student mathematical illustrations.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2510.05538

Genre: Research Report (1.00)

Industry:

Education > Curriculum > Subject-Specific Education (0.88)
Education > Educational Setting > K-12 Education > Middle School (0.75)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Guiding Multimodal Large Language Models with Blind and Low Vision People Visual Questions for Proactive Visual Interpretations

Penuela, Ricardo Gonzalez, Arias-Russi, Felipe, Capriles, Victor

arXiv.org Artificial IntelligenceOct-3-2025

Multimodal large language models (MLLMs) have been integrated into visual interpretation applications to support Blind and Low Vision (BLV) users because of their accuracy and ability to provide rich, human-like interpretations. However, these applications often default to comprehensive, lengthy descriptions regardless of context. This leads to inefficient exchanges, as users must go through irrelevant details rather than receiving the specific information they are likely to seek. To deliver more contextually-relevant information, we developed a system that draws on historical BLV users questions. When given an image, our system identifies similar past visual contexts from the VizWiz-LF dataset and uses the associated questions to guide the MLLM generate descriptions more relevant to BLV users. An evaluation with three human labelers who revised 92 context-aware and context-free descriptions showed that context-aware descriptions anticipated and answered users' questions in 76.1% of cases (70 out of 92) and were preferred in 54.4% of comparisons (50 out of 92). Our paper reviews, and data analysis are publicly available in a Github repository at https://github.com/rgonzalezp/guiding-multimodal-large-language-models-with-blind-and-low-vision-people-visual-questions .

information, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2510.01576

Country: North America (0.15)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Don't Get Me Wrong: How to Apply Deep Visual Interpretations to Time Series

Loeffler, Christoffer, Lai, Wei-Cheng, Eskofier, Bjoern, Zanca, Dario, Schmidt, Lukas, Mutschler, Christopher

arXiv.org Artificial IntelligenceSep-15-2023

The correct interpretation and understanding of deep learning models are essential in many applications. Explanatory visual interpretation approaches for image, and natural language processing allow domain experts to validate and understand almost any deep learning model. However, they fall short when generalizing to arbitrary time series, which is inherently less intuitive and more diverse. Whether a visualization explains valid reasoning or captures the actual features is difficult to judge. Hence, instead of blind trust, we need an objective evaluation to obtain trustworthy quality metrics. We propose a framework of six orthogonal metrics for gradient-, propagation- or perturbation-based post-hoc visual interpretation methods for time series classification and segmentation tasks. An experimental study includes popular neural network architectures for time series and nine visual interpretation methods. We evaluate the visual interpretation methods with diverse datasets from the UCR repository and a complex, real-world dataset and study the influence of standard regularization techniques during training. We show that none of the methods consistently outperforms others on all metrics, while some are sometimes ahead. Our insights and recommendations allow experts to choose suitable visualization techniques for the model and task.

dataset, deep visual interpretation, saliency map, (13 more...)

arXiv.org Artificial Intelligence

2203.07861

Country:

North America > United States > California > Los Angeles County > Long Beach (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
(21 more...)

Genre: Research Report > New Finding (0.87)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

How Useful Are the Machine-Generated Interpretations to General Users? A Human Evaluation on Guessing the Incorrectly Predicted Labels

Shen, Hua, Huang, Ting-Hao Kenneth

arXiv.org Artificial IntelligenceAug-27-2020

Explaining to users why automated systems make certain mistakes is important and challenging. Researchers have proposed ways to automatically produce interpretations for deep neural network models. However, it is unclear how useful these interpretations are in helping users figure out why they are getting an error. If an interpretation effectively explains to users how the underlying deep neural network model works, people who were presented with the interpretation should be better at predicting the model's outputs than those who were not. This paper presents an investigation on whether or not showing machine-generated visual interpretations helps users understand the incorrectly predicted labels produced by image classifiers. We showed the images and the correct labels to 150 online crowd workers and asked them to select the incorrectly predicted labels with or without showing them the machine-generated visual interpretations. The results demonstrated that displaying the visual interpretations did not increase, but rather decreased, the average guessing accuracy by roughly 10%.

artificial intelligence, interpretation, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2008.11721

Country: North America > United States > Pennsylvania > Centre County > University Park (0.04)

Genre:

Research Report > Experimental Study (0.49)
Research Report > New Finding (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.73)

Add feedback

Visual Interpretation for Artificial Intelligences [Video ] - TechAcute

#artificialintelligenceJul-19-2016, 01:55:42 GMT

Professor Fei Fei Li invests her time researching computer sciences at the Stanford University. She is Director of the Stanford AI Lab, Researcher in AI, computer vision, machine learning and cognitive neuroscience. We particularly loved her latest appearance on this TED Talks event about how systems learn how physical objects look and wanted to share this with you. YouTube: "Fei Fei Li: How we're teaching computers to understand pictures" by TED Talks

artificial intelligence, video understanding, visual interpretation, (1 more...)

#artificialintelligence

Industry:

Education (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.80)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science (0.80)
Information Technology > Artificial Intelligence > Vision > Video Understanding (0.40)

Add feedback

Reconstructing Self Organizing Maps as Spider Graphs for better visual interpretation of large unstructured datasets

Prakash, Aaditya

arXiv.org Machine LearningDec-24-2012

Self-Organizing Maps (SOM) are popular unsupervised artificial neural network used to reduce dimensions and visualize data. Visual interpretation from Self-Organizing Maps (SOM) has been limited due to grid approach of data representation, which makes inter-scenario analysis impossible. The paper proposes a new way to structure SOM. This model reconstructs SOM to show strength between variables as the threads of a cobweb and illuminate inter-scenario analysis. While Radar Graphs are very crude representation of spider web, this model uses more lively and realistic cobweb representation to take into account the difference in strength and length of threads. This model allows for visualization of highly unstructured dataset with large number of dimensions, common in Bigdata sources.

artificial intelligence, dataset, machine learning, (16 more...)

arXiv.org Machine Learning

1301.0289

Genre: Research Report (0.82)

Industry: Health & Medicine (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Add feedback